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Abstract —To enjoy more social network services, users nowa¬ 
days are usually involved in multiple online social networks 
simultaneously. The shared users between different networks are 
called anchor users, while the remaining unshared users are 
named as non-anchor users. Connections between accounts of 
anchor users in different networks are defined as anchor links and 
networks partially aligned by anchor links can be represented 
as partially aligned networks. In this paper, we want to predict 
anchor links between partially aligned social networks, which 
is formally defined as the partial network alignment problem. 
The partial network alignment problem is very difficult to solve 
because of the following two challenges: (1) the lack of general 
features for anchor links, and (2) the “one — to — one<^ (one 
to at most one) constraint on anchor links. To address these 
two challenges, a new method PNA (Partial Network Aligner) is 
proposed in this paper. PNA (1) extracts a set of explicit anchor 
adjacency features and latent topological features for anchor links 
based on the anchor meta path concept and tensor decomposition 
techniques, and (2) utilizes the generic stable matching to identify 
the non-anchor users to prune the redundant anchor links 
attached to them. Extensive experiments conducted on two real- 
world partially aligned social networks demonstrate that PNA 
can solve the partial network alignment problem very well and 
outperform all the other comparison methods with significant 
advantages. 

Index Terms —Partial Network Alignment; Multiple Heteroge¬ 
neous Social Networks; Data Mining 

I. Introduction 

In recent years, online social networks providing various 
featured services have become an essential part in our lives. 
To enjoy more social network services, users nowadays are 
usually involved in multiple online social networks simultane¬ 
ously HU, 1361 . IJTI . l4^ and there can be significant overlaps 
of users shared by different networks. As pointed out in |6l, by 
the end of 2013, 42% of online adults are using multiple social 
sites at the same time. For example, 93% of Instagram users 
are involved in Facebook concurrently and 53% Twitter users 
are using Instagram as well C3. Formally, the common users 
involved in different networks simultaneously are named as the 
''anchor users'' C3, while the remaining unshared users are 
called the "non-anchor users" l42l . The connections between 
accounts of anchor users in different networks are defined 
as the "anchor links" ca and networks partially aligned 
by anchor links can be represented as "partially aligned 
networks" iJTl . 


Problem Studied: In this paper, we want to predict the anchor 
links across partially aligned networks, which is formally 
defined as the "partial network alignment" problem. 

Partial network alignment problem is very important for 
social networks and can be the prerequisite for many real- 
world social applications, e.g., link prediction and recom¬ 
mendations 1361 . 123, |42l, l40l . community detection da, 
1^ . BTl and information diffusion 1^ . Identifying accounts 
of anchor users across networks provides the opportunity to 
compose a more complete social graph with users’ information 
in all the networks they are involved in. Information in the 
complete social graph is helpful for a better understanding 
of users’ social behavior in online social networks lEl, 
ED, Ea. In addition, via the predicted anchor links, cross¬ 
platform information exchange enables new social networks 
to start their services based on the rich data available in 
other developed networks. The information transferred from 
developed networks can help emerging networks 03, ED to 
overcome the information shortage problem promisingly ll3^ . 

E3, ESI 

What’s more, the partial network alignment problem is 
a novel problem and different from existing link prediction 
works, like (1) traditional intra-network link prediction prob¬ 
lems 1^ . E3, which mainly focus on predicting links in one 
single social network, (2) inter-network link transfer problems 
E3, which can predict links in one single network with 
information from multiple aligned networks, and (3) inferring 
anchor links across fully aligned networks 113 , which aims 
at predicting anchor links across fully aligned networks. 

The inferring anchor links across fully aligned networks 
problem ifTHl also studies the anchor link prediction problem. 
However, both the problem setting and method proposed to 
address the "network alignment" problem between two fully 
aligned networks in M are very ad hoc and have many 
disadvantages. First of all, the full alignment assumption of 
social networks proposed in ifT^ is too strong as fully aligned 
networks can hardly exist in the real world E3 Secondly, the 
features extracted for anchor links in ns are proposed for 
Foursquare and Twitter specifically, which can be hard to get 
generalized to other networks. Thirdly, the classification based 
link prediction algorithm used in ifT^ can suffer from the 
class imbalance problem ca, Eo). The problem will be more 


serious when dealing with partially aligned networks. Finally, 
the matching algorithm proposed in ifT^ is designed specially 
for fully aligned networks and maps all users (including both 
anchor and non-anchor users) from one network to another 
network via the predicted anchor links, which will introduce 
a large number of non-existing anchor links when applied in 
the partial network alignment problem. 

Totally different from the ''inferring anchor links across 
fully aligned networks'' problem (141, we study a more general 
network alignment problem in this paper. Firstly, networks 
studied in this paper are partially aligned (421, which contain 
large number of anchor and non-anchor users (4^ at the 
same time. Secondly, networks studied are not confined to 
Foursquare and Twitter social networks. A minor revision 
of the "partial network alignment" problem can be mapped 
to many other existing tough problems, e.g., large biology 
network alignment m, entity resolution in database inte¬ 
gration O, ontology matching (71, and various types of 
entity matching in online social networks (22l . Thirdly, the 
class imbalance problem will be addressed via link sampling 
effectively in the paper. Finally, the constraint on anchor 
links is "one — to — one<" (i.e., each user in one network 
can be mapped to at most one user in another network). 
Across partially aligned networks, only anchor users can be 
connected by anchor links. Identifying the non-anchor users 
from networks and pruning all the predicted potential anchor 
links connected to them is a novel yet challenging problem. 
The "one — to — one<" constraint on anchor links can dis¬ 
tinguish the "partial network alignment" problem from most 
existing link prediction problems. For example, in traditional 
link prediction and link transfer problems (26l . (27]| . (37ll . the 
constraint on links is " many-to-many" , while in the "anchor 
link inference" problem d across fully aligned networks, the 
constraint on anchor links is strict "one-to-one". 

To solve the "partial network alignment" problem, a new 
method, PNA (Partial Network Aligner), is proposed in this 
paper. PNA exploits the concept of anchor meta paths (42]| . 
(26l and utilizes the tensor decomposition ED, ca tech¬ 
nique to obtain a set of explicit anchor adjacency features 
and latent topological features. In addition, PNA generalizes 
the traditional stable matching to support partially aligned 
network through self-matching and partial stable matching 
and introduces the a novel matching method, generic stable 
matching, in this paper. 

The rest of this paper is organized as follows. In Section [I^ 
we will give the definition of some important concepts and for¬ 
mulate the partial network alignment problem. PNA method 
will be introduces in Sections nniiiYi Section m is about 


the experiments. Related works will be given in Section VI 


Finally, we conclude the paper in Section VII 


II. Problem Formulation 

Before introducing the method PNA, we will first define 
some important concepts and formulate the partial network 
alignment problem in this section. 


A. Terminology Definition 

Definition 1 (Heterogeneous Social Networks): A heteroge¬ 
neous social network can be represented as G = (V, £), where 
V = contains the sets about various kinds of nodes, 

while £ = [jj £j is the set of different types of links among 
nodes in V. 

Definition 2 (Aligned Heterogeneous Social Networks): Social 
networks that share common users are defined as the aligned 
heterogeneous social networks, which can be represented as 
Q = {Gseti ^set)^ where Gget = {G^^\ G^‘^\ • • • 
is the set of n different heterogeneous social networks and 
^set = is the sets of undi¬ 

rected anchor links between networks in Gget- 
Definition 3 (Anchor Link): Given two social networks G^*^ 
and G^^\ link is an anchor link between G^*^ and 

G^^^ iff G A G A and are 

accounts of the same user), where and are the user 
sets of G*^*^ and G^^^ respectively. 

Definition 4 (Anchor Users and Non-anchor Users): Users 
who are involved in two social networks, e.g., G^*^ and G^^\ 
simultaneously are defined as the anchor users between G^*^ 
and G^^\ Anchor users in G^*^ between G^*^ and G^^^ can be 
represented as G G and 

G Meanwhile, the non-anchor user in G^*^ 

between G^*^ and G^^^ are those who are involved in G^*^ only 
and can be represented as • Similarly, 

the anchor users and non-anchor users in G^^^ between G^^^ 
and G^*^ can be defined as and respectively. 

Definition 5 (Full Alignment, Partial Alignment and Isolated): 
Given two social networks G^*^ and G^^\ if users in both 
G^*^ and G^^^ are all anchor users, i.e., and 

U^o) = then G^*^ and G^^^ sliq. fully aligned', if users 

in both of these two networks are all non-anchor users, i.e., 
and these two networks 

are isolated', otherwise, they are partially aligned. 

Definition 6 (Bridge Nodes): Besides users, many other kinds 
of nodes can be shared between different networks, which are 
defined as the bridge nodes in this paper. The bridge nodes 
shared between G^*^ and G^^^ can be represented as = 

{^|(i; G (V^^) - U^^)) A{ve - 

The social networks studied in this paper can be any 
partially aligned social networks and we use Foursquare, 
Twitter as a example to illustrate the studied problem and the 
proposed method. Users in both Foursquare and Twitter can 
make friends with other users, write posts, which can contain 
words, timestamps, and location checkins. In addition, users 
in Foursquare can also create lists of locations that they have 
visited/want to visit in the future. As a result. Foursquare and 
Twitter can be represented as heterogeneous social network 
G = (V, £). In Twitter V = Z//U7^UWUTU£ and in 
Foursquare V = Z//U7^UWUTUXU>C, where U, V, W, 
T, T and C are the nodes of users, posts, words, timestamps, 
lists and locations. While in Twitter, the heterogeneous link 
set £ = £u,u U £u,p U £p^w U £p^t U £p^i and in Foursquare 
£ = £u,u^£u,p^£p,w^£p,t^£p,i^£u,i^£i,h The bridge nodes 






shared between Foursquare and Twitter include the common 
locations, common words and common timestamps. 

B. Problem Statement 

Definition 7 (Partial Network Alignment): For any two given 
partially aligned heterogeneous social networks, e.g., Q = 
part of the known anchor links be¬ 
tween G^*^ and are represented as . Let be 

the user sets of G^*^ and respectively, the set of other po¬ 
tential anchor links between G^*^ and G^^^ can be represented 
as 

We solve the partial network alignment problem as a link 
classification problem, where existing and non-existing anchor 
links are labeled as “-h 1” and “-1” respectively. In this paper, 
we aim at building a model M with the existing anchor links 
which will be applied to predict potential anchor links 
in £(*J) In model M, we want to determine both labels and 
existence probabilities of anchor links in 
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path in the network, where Alice, Bob and San Jose are the 
users and location in the network. 


III. Feature Extraction and Anchor Link 
Prediction 

Supervised link prediction method has been widely used 
in research due to its excellent performance and the pro¬ 
found supervised learning theoretical basis. In supervised link 
prediction, links are labeled differently according to their 
physical meanings, e.g., existing vs non-existent na, friends 
vs enemies ED, trust vs distrust (321, positive attitude vs 
negative attitude (331. With information in the networks, a 
set of heterogeneous features can be extracted for links in the 
training set, which together with the labels are used to build 
the link prediction model M. 

In this section, we will introduce different categories of 
general features extracted for anchor links across partially 
aligned networks, which include a set of explicit anchor 
adjacency features based on anchor meta paths and the ''latent 
topological feature vector'' extracted via anchor adjacency 
tensor decomposition. 

A. Traditional Intra-Network Meta Path 

Traditional meta paths are mainly defined based on the 
social network schema of one single network (26l, (28l. 
Definition 8 (Social Network Schema): For a given network 
G, its schema is defined as Sq = {Tg^'TIg), where Tg and 
IZg are the sets of node types and link types in G respectively. 
Definition 9 (Meta Path): Based on the schema of network G, 
i.e., Sg = (Tg? the traditional intra-network meta path in 

G is defined as $ = Ti ^ T 2 ^ T^, where Ti e 

To.i e {1,2,--- ,fc} and Rj e Rod e {1,2,--- ,fc - 1} 

Ea, EU. 

For example, according to the networks introduced in Sec¬ 
tion |II| we can define the network schema of Twitter as Sg = 
{{User^ Post, Word, Timestamp, List, Location}, {Follow, 
Write, Greate,Gontain, At, Gheckin}). Based on the 
schema, "User - Location - User" is a meta path of length 2 
connecting user nodes in the network via location node and 
path "Alice - San Jose - Bob" is an instance of such meta 


B. Inter-Network Anchor Meta Path 

Traditional Intra-network meta paths defined based on one 
single network cannot be applied to address the inter-network 
partial network alignment problem directly. To overcome such 
a problem, in this subsection, we will define the concept of 
anchor meta paths and introduce a set of inter-network anchor 
meta paths ED across partially aligned networks. 

Definition 10 (Aligned Social Network Schema): Given the 
partially aligned networks: Q = {Gseti ^set)^ 1^1 ^g(0 = 
(Tg(o?^g(^)) he the schema of network G^*^ G G^et, the 
schema of partially aligned networks Q can be defined as 

U {Anc/ior}), where {Anchor} 

is the anchor link type. 

An example of the schema about two partially aligned 
social networks, e.g., G^*^ (e.g.. Foursquare) and G^^^ (e.g., 
Twitter), is shown in Figure where the schema of these two 
aligned networks are connected by the anchor link type and 
the green dashed circles are the shared bridge nodes between 
and G^^\ 

Definition 11 (AMP: Anchor Meta Path): Based on the aligned 
social network schema, anchor meta paths connecting users 

across Q is defined to be = Ti T 2 T^, 

where Ti and Tk are the “User” node type in two par¬ 
tially aligned social networks respectively. To differentiate 
the anchor link type from other link types in the anchor 
meta path, the direction of Ri in will be bidirectional if 
Ri = Anchor, i G {1, 2, • • • ,k — 1}, i.e., Ti Tj. 

Via the instances of anchor meta paths, users across aligned 
social networks can be extensively connected to each other. 
In the two partially aligned social networks (e.g., Q = 
((G^*\ G*^-^^), (Gl^*’-^^))) studied in this paper, various anchor 
meta paths from G^*^ (i.e.. Foursquare) and G^^^ (i.e., Twitter) 
can be defined as follows: 

• Common Out Neighbor Anchor Meta Path (T^i): Uscr^'^^ 
Used^ UseG^ UsedJ^ or 

“Kii) ^ Kii) ^ uU) ^ kU> for short. 










• Common In Neighbor Anchor Meta Path (^ 2 )- User^'^^ 

UseA^ User^^'f UseA^'^ or 

• Common Out In Neighbor Anchor Meta Path (^ 3 ): 

f/ser(*) UseA^'> UseA^'> 

UseA^'^ or -> ^ ^ U^i>\ 

• Common In Out Neighbor Anchor Meta Path (^ 4 ): 

f/ser(*) UseA^'f UseA^'> 

f/ser(j) or ^ ^ ^ U^^>. 

These above anchor meta paths are all defined based 
the “User” node type only across partially aligned social 
networks. Furthermore, there can exist many other anchor 
meta paths consisting of user node type and other bridge node 
types from Foursquare to Twitter, e.g.. Location, Word and 
Timestamp. 

• Common Location Checkin Anchor Meta Path I (T^s): 

rr write 7 -, ,(A\ checkin at -r checkin at 

U ^ PosC^’ -^ Location i - 

Posi(i) UseA^'f or ^ C ^ ^ 

• Common Location Checkin Anchor Meta Path 2 (T^e)- 

tt (i\ create -r . ,(a\ contain ^ checkin at 

UseN^^ -)► LisC^^ -)► Location i - 

PosAA UseA^'f or C ^ ^ 

• Common Timestamps Anchor Meta Path (^' 7 ): UseA^ 

AIA% PostA) ^ Time ^ PosA^^ UseA^) or 

pii) -^T <r- V^^'> ^ 

• Common Word Usage Anchor Meta Path (^'g): UseA^ 

write^ pQg^{i) contain^ Wovd Post^^^ ^write 

UseA^'f or ^ pW -i> W ^ ^ 

C. Explicit Anchor Adjacency Features 


The anchor adjacency scores among all users across par¬ 
tially aligned networks can be stored in the anchor adjacency 
matrix as follows. 

Definition 14 (AAM: Anchor Adjacency Matrix): Given a 
certain anchor meta path, the anchor adjacency matrix 
between and can be defined as A^r G nI^ 
and A{l,m) = AAS^{u\^\um),u\^^ G G 

Multiple anchor adjacency matrix can be grouped together 
to form a high-order tensor. A tensor is a multidimensional 
array and an N-order tensor is an element of the tensor product 
of N vector spaces, each of which can have its own coordinate 
system. As a result, an 1-order tensor is a vector, a 2-order 
tensor is a matrix and tensors of three or higher order are 
called the higher-order tensor Ca, EH. 

Definition 15 (AAT: Anchor Adjacency Tensor): Based on 
meta paths in{T^i,T^2,-'’ ^^sj.we can obtain a set of anchor 
adjacency matrices between users in two partially aligned 
networks to be {A^^, A 4 . 25 '' * With {A^^, A 4 . 2 : 

••• , A^^g}, we can construct a 3-order anchor adjacency 
tensor A G where the ith layer of A is the 

anchor adjacency matrix based on anchor meta path i.e., 
A(:, :,i) = G {1,2, • • • ,8}. 

Based on the anchor adjacency tensor, a set of explicit 
anchor adjacency features can be extracted for anchor links 
across partially aligned social networks. 

Definition 16 (EAAF: Explicit Anchor Adjacency Features): 

ii) ii)\ 

For a certain anchor link {ui \um), the explicit anchor adja¬ 
cency feature vectors extracted based on the anchor adjacency 

tensor X can be represented as x = [xi, X 2 , • • • , xg] (i.e., the 

ii) 

anchor adjacency scores between ui and uA based on 8 
different anchor meta paths), where = X{l,m,k)^k G 
{ 1 , 2 ,--- , 8 }. 


Based on the above defined anchor meta paths, different 
kinds of anchor meta path based adjacency relationship can 
be extracted from the network. In this paper, we define the 
new concepts of anchor adjacency score, anchor adjacency 
tensor and explicit anchor adjacency features to describe 
such relationships among users across partially aligned social 
networks. 

Definition 12 (Anchor Meta Path Instance): Based on anchor 
meta path = Ti T 2 T^, path f) = 

ni — 722 — • • • — ^/c-i “ is an instance of iff rii is an 
instance of node type Ti, i G {1, 2, • • • , /c} and ( 72 ^, 72 ^+ 1 ) is 
an instance of link type Ri, \fi 

Definition 13 (AAS: Anchor Adjacency Score): The anchor 
adjacency score is quantified as the number of anchor meta 
path instances of various anchor meta paths connecting users 
across networks. The anchor adjacency score between G 
and G based on meta path is defined as: 


AAS^{A^\v^^'>) = {ip\{'(p e «') A {«(*) e Ti) A e TAj} 


D. Latent Topological Feature Vectors Extraction 

Explicit anchor adjacency features can express manifest 
properties of the connections across partially aligned networks 
and are the explicit topological features. Besides explicit topo¬ 
logical connections, there can also exist some hidden common 
connection patterns across partially aligned networks. In 
this paper, we also propose to extract the latent topological 
feature vectors from the anchor adjacency tensor. 

As proposed in C3, ED, a higher-order tensor can be 
decomposed into a core tensor, e.g., Q, multiplied by a matrix 
along each mode, e.g., A,B,''' ,Z, with various tensor 
decomposition methods, e.g.. Tucker decomposition fUl . For 
example, the 3-order anchor adjacency tensor X can be 
decomposed into three matrices A G B G R^^^^xQ 

and C G and a core tensor Q G R^xQxi?^ where 

P^Q^R are the number of columns of matrices A,B,C EH: 

P Q R 

X = ^ ^ ^ ^ ^ ^ Qpqr^p t)g O Crp = \Q] A, B, C], 

’ p=l q=l r=l 


where path A starts and ends with node types Ti and where o denotes the vector outer product of and hq. 
respectively and 7/^ G denotes that 7/^ is a path instance Each row of A and B represents a latent topological 
of meta path feature vector of users in and respectively 1 ^ . 
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Fig. 2. Instance distribution in feature space. 


Method HOSVD introduced in lITSl is applied to achieve these 
decomposed matrices in this paper. 

E. Class Imbalance Link Prediction 

Based on the extracted features, various supervised link 
prediction models HU, IITtI . ll42l can be applied to infer the 
potential anchor links across networks. As proposed in 1^ . 
ca, conventional supervised link prediction methods 1291 , 
can suffer from the class imbalance problem a lot. To address 
the problem, two effective methods {down sampling ca and 
over sampling (H) are applied. 

Down sampling methods aim at deleting the unreliable 
negative instances from the training set. In Figure we show 
the distributions of training links in the feature space, where 
negative links can be divided into 4 different categories CS: 

(1) noisy links: links mixed in the positive links; (2) borderline 
links: links close to the decision boundary; (3) redundant links: 
links which are too far away from the decision boundary in 
the negative region; and (4) safe links: links which are helpful 
for determining the classification boundary. 

Different heuristics have been proposed to remove the noisy 
instances and borderline instances, which are detrimental for 
the learning algorithms. In this paper, we will use the method 
called Tomek links proposed in |[30l, da. For any two given 
instances xi and X 2 of different labels, pair (xi,X 2 ) is called 
a tomek link if there exists no other instances, e.g., z, such 
that (i(xi, z) < (i(xi, X 2 ) and (i(x 2 , z) < (i(xi, X 2 ). Examples 
that participate in Tomek links are either borderline or noisy 
instances l(30l , da. As to the redundant instances, they will 
not harm correct classifications as their existence will not 
change the classification boundary but they can lead to extra 
classification costs. To remove the redundant instances, we 
propose to create a consistent subset C of the training set, e.g., 
5 da. Subset C is consistent with S if classifiers built with 
C can correctly classify instances in S. Initially, C consists 
of all positive instances and one randomly selected negative 
instances. A classifier, e.g., kNN, built with C is applied to 
S, where instances that are misclassified are added into C. The 
final set C contains the safe links. 

Another method to overcome the class imbalance problem is 
to over sample the minority class. Many over sampling meth¬ 
ods have been proposed, e.g., over sampling with replacement. 


over sampling with ''synthetic” instances 0: the minority 
class is over sampled by introducing new “synthetic” examples 
along the line segment joining m of the k nearest minority 
class neighbors for each minority class instances. Parameter k 
is usually set as 5, while the value of m can be determined 
according to the ratio to over sample the minority class. For 
example, if the minority class need to be over sampled 200%, 
then m = 2. The instance to be created between a certain 
example x and one of its nearest neighbor y can be denoted 
as X -|- 6>^(x — y), where x and y are the feature vectors of 
two instances and 0^ is the transpose of a coefficient vector 
containing random numbers in range [0,1]. 

IV. Anchor Link Pruning with Generic Stable 
Matching 

In this section, we will introduce the anchor link pruning 
methods in details, which include (1) candidate pre-pruning, 

(2) brief introduction to the traditional stable matching, and 

(3) the generic stable matching method proposed in this paper, 
which generalizes the concept of traditional stable matching 
through both self matching and partial stable matching. 

A. Candidate Pre-Pruning 

Across two partially aligned social networks, users in a 
certain network can have a large number of potential anchor 
link candidates in the other network, which can lead to 
great time and space costs in predicting the anchor links. 
The problem can be even worse when the networks are of 
large scales, e.g., containing million even billion users, which 
can make the partial network alignment problem unsolvable. 
To shrink size of the candidate set, we propose to conduct 
candidate pre-pruning of links in the test set with users’ profile 
information (e.g., names and hometown). 

As shown in Figure in the given input test set, users 
are extensively connected with all their potential partners in 
other networks via anchor links. For each users, we propose 
to prune their potential candidates according to the following 
heuristics: 

• profile pre-pruning: users’ profile information shared 
across partially aligned social networks, e.g.. Foursquare 
and Twitter, can include username and hometown ma. 
Given an anchor link {vy\u'm) e C, if the username 
and hometown of ^ and are totally different, e.g., 
cosine similarity scores are 0, then link {u[^\um) will 
be pruned from test set C. 

m EAAF pruning: based on the explicit anchor adjacency 
tensor A' extracted in Section |ml for a given link 
) G £, if its extracted explicit anchor adjacency 
features are all 0, i.e., A(/, m, x) = 0, x G {1, 2, • • • , 8}, 
then link {u[^\um) will be pruned from test set C. 

B. Traditional Stable Matching 

Meanwhile, as proposed in flAl . the one-to-one constraint of 
anchor links across/wZ/y aligned social networks can be met by 
pruning extra potential anchor link candidates with traditional 
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Stable matching. In this subsection, we will introduce the 
concept of traditional stable matching briefly. 

Given the user sets and of two partially aligned 
social networks and G^‘^\ each user in U^^\ov has 
his preference over users in U^‘^\or Term VjP^^Vk is 

used to denote that Ui G prefers Vj to for simplicity, 
where Vj^Vk G and is the preference operator of 
Ui G Similarly, we can use term UiPv'fuk to denote that 
Vj G prefers Ui to Uk in as well. 

Definition 17 (Matching): Mapping p : U 

is defined to be a matching iff (1) \p^{ui)\ = 1, G 
and p.{ui) G (2) \p^{vj)\ = l.^vj G and p^{vj) G 

(3) p.{ui) = Vj iff p^{vj) = Ui. 

Definition 18 (Blocking Pair): A pair {ui^Vj) is a a blocking 
pair of matching p if Ui and Vj prefers each other to their 
mapped partner, i.e., {p^{ui) ^ A {p^{vj) 7 ^ Ui) and 
{vjPy^ li{ui)) A {uiP^f Ii{vj)). 

Definition 19 (Stable Matching): Given a matching p, p is 
stable if there is no blocking pair in the matching results 0 . 

As introduced in lEl, the stable matching can be obtained 
with the Gale-Shapley algorithm proposed in Q. 


C. Generic Stable Matching 

Stable matching based method proposed in m can only 
work well m fully aligned social networks. However, in the real 
world, few social networks are fully aligned and lots of users 
in social networks are involved in one network only, i.e., non¬ 
anchor users, and they should not be connected by any anchor 
links. However, traditional stable matching method cannot 
identify these non-anchor users and remove the predicted 
potential anchor links connected with them. To overcome such 
a problem, we will introduce the generic stable matching to 
identify the non-anchor users and prune the anchor link results 
to meet the one — to — one< constraint. 

In PNA, we introduce a novel concept, self matching, which 
allows users to be mapped to themselves if they are discovered 
to be non-anchor users. In other words, we will identify the 
non-anchor users as those who are mapped to themselves in 
the final matching results. 

Definition 20 (Self Matching): For the given two partially 
aligned networks G^^^ and G^‘^\ user Ui G , can have his 
preference over users in G{ui} and Ui preferring Ui 
himself denotes that Ui is an non-anchor user and prefers to 
stay unconnected, which is formally defined as self matching. 

Users in one social network will be matched with either 
partners in other social networks or themselves according to 
their preference lists (i.e., from high preference scores to 
low preference scores). Only partners that users prefer over 
themselves will be accepted finally, otherwise users will be 
matched with themselves instead. 

Definition 21 (Acceptable Partner): For a given matching p : 

-> the mapped partner of users Ui G 

i.e., gi{ui), is acceptable to Ui iff p{ui)Pi]'^Ui. 

To cut off the partners with very low preference scores, we 
propose the partial matching strategy to obtain the promising 
partners, who will participate in the matching finally. 
Definition 22 (Partial Matching Strategy): The partial match¬ 
ing strategy of user Ui G i.e., consists of the first 

K the acceptable partners in uf^ preference list P^P, which 
are in the same order as those in PiP, and Ui in the {K + l)th 
entry of Qui • Parameter K is called the partial matching rate 
in this paper. 

An example is given in Figure where to get the top 2 
promising partners for the user, we place the user himself at 
the ‘^rd cell in the preference list. All the remaining potential 
partners will be cut off and only the top 3 users will participate 
in the final matching. 

Based on the concepts of self matching and partial matching 
strategy, we define the concepts of partial stable matching and 
generic stable matching as follow. 

Definition 23 (Partial Stable Matching): For a given match¬ 
ing /i, yU is (1) rational if gi{ui)Q^Ui Ui.,\/ui G and 

p{vj)Q^v^Vj^Vj G (2) pairwise stable if there exist 

no blocking pairs in the matching results, and (3) stable if it 
is both rational and pairwise stable. 

Definition 24 (Generic Stable Matching): For a given match¬ 
ing /i, /i is a generic stable matching iff /i is a self matching 














TABLE I 

Properties oe the Heterogeneous Networks 




network 


property 

Twitter 

Foursquare 


user 

5,223 

5,392 

# node 

tweet/tip 

9,490,707 

48,756 


location 

297,182 

38,921 


friend/follow 

164,920 

76,972 

# link 

write 

9,490,707 

48,756 


locate 

615,515 

48,756 


or /i is a partial stable matching. 

As example of generic stable matching is shown in the 
bottom two plots of Figure Traditional stable matching can 
prune most non-existing anchor links and make sure the results 
can meet one-to-one constraint. However, it preserves the 
anchor links (Rebecca, Becky) and (Jonathan, Jon), which are 
connecting non-anchor users. In generic stable matching with 
parameter K = 1, users will be either connected with their 
most preferred partner or stay unconnected. Users “William” 
and “Wm” are matched as link (William, Wm) has the highest 
score. “Rebecca” and “Jonathan” will prefer to stay uncon¬ 
nected as their most preferred partner “Wm” is connected with 
“William” already. Furthermore, “Becky” and “Jon” will stay 
unconnected as their most preferred partner “Rebecca” and 
“Jonathan” prefer to stay unconnected. In this way, generic 
stable matching can further prune the non-existing anchor links 
(Rebecca, Becky) and (Jonathan, Jon). 

The truncated generic stable matching results can be 
achieved with the Generic Gale-Shapley algorithm as given 
in Algorithmic 


V. Experiments 

To demonstrate the effectiveness of PNA in predicting an¬ 
chor links for partially aligned heterogeneous social networks, 
we conduct extensive experiments on two real-world hetero¬ 
geneous social networks: Foursquare and Twitter. This section 
includes three parts: (1) dataset description, (2) experiment 
settings, and (3) experiment results. 

A. Dataset Description 

The datasets used in this paper include: Foursquare and 
Twitter, which were crawled during November 2012 C3, oa, 
( 221 , 1421 . More detailed information about these two datasets 
is shown in Table |I| and in O, (36|, l22l, 132. The number 
of anchor links crawled between Foursquare and Twitter is 
3, 388 and 62.83% Foursquare users are anchor users. 


B. Experiment Settings 

In this part, we will talk about the experiment settings in de¬ 
tails, which includes: (1) comparison methods, (2) evaluation 
methods, and (3) experiment setups. 


Algorithm 1 Generic Gale-Shapley Algorithm 

Input: user sets of aligned networks: and U^‘^\ 

classification results of potential anchor links in C 
known anchor links in 
truncation rate K 

Output: a set of inferred anchor links C 
1: Initialize the preference lists of users in and with 
predicted existence probabilities of links in C and known 
anchor links in whose existence probabilities are 

1.0 

2: construct the truncated strategies from the preference lists 


3: 

4: 

5: 

6: 

7: 

8 : 

9: 

10: 

11: 

12: 

13: 

14: 

15: 

16: 

17: 

18: 

19: 

20: 

21: 

22: 


Initialize all users and as free 

£' = 0 

while 3 free in and u^’s truncated strategy is 
non-empty do 

Remove the top-ranked account from trun¬ 

cated strategy 

if then 

C = C’U{{uy,uy)} 

Set as stay unconnected 
else 

if is free then 

Set and uf as occupied 
else 

that is occupied with, 
if prefers to then 

C' = {C'-{{vy\uf)})yj{{uy,uf)} 

Set as free and as occupied 

end if 
end if 
end if 
end while 


1) Comparison Methods: The comparison methods used in 
the experiments can be divided into the following 4 categories: 
Methods with Generic Stable Matching: 

• PNAomg: PNAomg (PNA with Over sampling & 
Generic stable Matching) is the method proposed in this 
paper, which consists of two steps: (1) class imbalance 
link prediction with over sampling, and (2) candidate 
pruning with generic stable matching. 

• PNAdmg: PNAdmg (PNA with Down sampling & 
Generic stable Matching) is another method proposed 
in this paper, which consists of two steps: (1) class 
imbalance link prediction with down sampling, and (2) 
candidate pruning with generic stable matching. 

Methods with Traditional Stable Matching 

• PNAom: PNAom (PNA with Over sampling & tradi¬ 
tional stable Matching) is identical to PNAOMG except 
that in the second step, PNAom applies the traditional 
stable matching El, M. 













(a) AUC: alignment rate (b) AUC: negative positive rate 

Fig. 5. AUC of different class imbalance link prediction methods. 


• PNAdm: PNAdm (PNA with Down sampling & tradi¬ 
tional stable Matching) is identical to PNAdmg except 
that in the second step, PNAdm applies the traditional 
stable matching 13, C3. 

Class Imbalance Anchor Link Prediction: 

• PNAo: PNAo (PNA with Over sampling) is the link 
prediction method with over sampling to overcome the 
class imbalance problem and has no matching step. 

• PNAd: PNAd (PNA with Down sampling) is the link 
prediction method with down sampling to overcome the 
class imbalance problem and has no matching step. 

Existing Network Anchoring Methods 

• Mna: Mna (Multi-Network Anchoring) is a two-phase 
method proposed in ifTH which includes: (1) supervised 
link prediction without addressing class imbalance prob¬ 
lem; (2) traditional stable matching 13, 03. 

• MNA_no: MNA_no (Mna without one-to-one constraint) 
is the first step of Mna proposed in lU^ which can pre¬ 
dict anchor links without addressing the class imbalance 
problem and has no matching step. 

2) Evaluation Metrics: The output of different link pre¬ 
diction methods can be either predicted labels or confidence 
scores, which are evaluated by Accuracy, AUC, El in the 
experiments. 

3) Experiment Setups: In the experiment, initially, a fully 
aligned network containing 3000 users in both Twitter and 
Foursquare is sampled from the datasets. All the existing 
anchor links are grouped into the positive link set and all the 
possible non-existing anchor links are used as the potential 
link set. Certain number of links are randomly sampled 
from the potential link set as the negative link set, which 
is controlled by parameter 0. Parameter 0 represents the 
^po^tive where 0 = 1 denotes the class balance case, 
i.e., fi^positive equals to fi^negative; 0 = bO represents that 
case that negative instance set is 50 times as large as that of the 
positive instance set, i.e., fi^negative = bOxfi^positive. In the 
experiment, 0 is chosen from {1, 2,3,4, 5,10, 20, 30,40, 50}. 
Links in the positive and negative link sets are partitioned 


into two parts with 10-fold cross validation, where 9 folds 
are used as the training set and 1 fold is used as the test set. 
To simulate the partial alignment networks, certain positive 
links are randomly sampled from the positive training set as 
the final positive training set under the control of parameter p. 
p is chosen from {0.1, 0.2, • • • , 1.0}, where p = 0.1 denotes 
that the networks are 10% aligned and p = 1.0 shows that the 
networks are fully aligned. With links in the positive training 
set, anchor adjacency tensor based features and the latent 
feature vectors are extracted from the network to build link 
prediction model A4. In building model A4, over sampling and 
under sampling techniques are applied and the sampling rate is 
determined by parameter a G {0.0, 0.1, 0.2, • • • , 1.0}, where 
CF = 0.3 denotes that 0.3 x {fi^negative — fi^positive) negative 
links are randomly removed from the negative link set in under 
sampling; or 0.3 x [fi^negative — fi^positive) positive links are 
generated and added to the positive link set in over sampling. 
Before applying model M to the test set, pre-pruning process 
is conducted on the test set in advance. Based on the prediction 
results of model M on the test set, post-pruning with generic 
stable matching is applied to further prune the non-existent 
candidates to ensure that the final prediction results across 
the partially aligned networks can meet the one — to — one< 
constraint controlled by the partial matching parameter K. 

C. Experiment Results 

In this part, we will give the experiment results of all 
these comparison methods in addressing the partial network 
alignment problem. This part includes (1) analysis of sampling 
methods in class imbalance link prediction; (2) performance 
comparison of different link prediction methods; and (3) 
parameter analysis. 

1) Analysis of Sampling Methods: To examine whether 
sampling methods can improve the prediction performance on 
the imbalanced classification problem or not, we also compare 
PNAo, PNAd with MNA_no and the results are given in 
Figure where we fix as 10 but change p with values in 
{0.1,0.2, • • • , 1.0} and compare the AUC achieved by PNAo, 
PNAd and MNA_no. We can observe that the AUC values of 
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(c) Acc.@6> = 50, 7? = 0.9 
Fig. 6. FI, Accuracy of PNAomg and PNAdmg with different partial matching rates. 



all these three methods increases with the increase of r] but 
PNAo and PNAd perform consistently better than MNA_no. 


In Figure |5(b)[ we fix r] as 0.6 but change 0 with values in 
{1, 2, 3, 4, 5, 10, 20, 30, 40, 50} and compare the AUC of 
PNAo, PNAd and MNA_no. As shown in Figure |5(b)[ the 
performance of PNAo, PNAd and MNA_no can all varies 
slightly with 0 changing from 1 to 50 and PNAo, PNAd can 
achieve better performance than MNA_no consistently. 

2) Comparison of Different Link Prediction Methods: 
Meanwhile, as generic stable matching based post pruning 
can only output the labels of potential anchor links in the test 
set, we also evaluate all these methods by comparing their 
Accuracy and FI score Tables Ilpn In Table we fix 6 > as 
10, Ff as 5 but change rj with values in {0.1,0.2, • • • , 1.0}. 
Table |I^ has two parts. The upper part of Table shows the 
Accuracy achieved by all the methods with various 77 , and the 
lower part shows the FI score. Generally, the performance of 
all comparison methods rises as p increases. In the upper part, 
methods PNAomg and PNAdmg can consistently perform 
better than all other comparison methods for different p. For 
example, when p = 0.5, the Accuracy achieved by PNAOMG 
is higher than PNAOM by 3.45%, higher than Mna by 6.0%, 
higher than PNAo by 7.51% and higher than MNA_no by 
7.75%; meanwhile, the Accuracy achieved by PNAdmg is 
higher than PNAdm, Mna, PNAd and MNA_no as well. The 
advantages of PNAomg and PNAdmg over other comparison 
methods are more obvious under the evaluation of FI as in 


class imbalance settings. Accuracy is no longer an appropriate 
evaluation metric 0 . For example, when p = 0.5, the FI 
achieved by PNAomg is about 13.25% higher than PNAom, 
24% higher than Mna, 101.6% higher than PNAo and 165% 
higher than MNA_no; so is the case for method PNAdmg. 
The experiment results show that PNAOMG and PNAdmg can 
work well with datasets containing different ratio of anchor 
links across the networks. Similar results can be obtained 
from Table where we fix 77 = 0.6, K as 5 but change 
0 with values in {1, 2,3,4, 5,10,20,30,40, 50}. It shows that 
PNAomg and PNAdmg can effectively address the class 
imbalance problem. 

The fact that (1) PNAomg can outperform PNAom 
(PNAdmg outperforms PNAdm) shows that generic stable 
matching can work well in dealing with partially aligned 
social networks', (2) PNAom can beat PNAo (and PNAdm 
beats PNAd) means that stable matching can achieve very 
good post-pruning results; (3) PNAom and PNAdm can 
perform better than Mna (or PNAo and PNAd can achieve 
better results than MNA_no) means that sampling methods can 
overcome the class imbalance problem very well. 

3) Analysis of Partial Matching Rate: In the generic sta¬ 
ble matching, only top K anchor link candidates will be 
preserved. In this part, we will analyze the effects of pa¬ 
rameter K on the performance of PNAOMG and PNAdmg. 
Figure gives the results (both Accuracy and FI) of 
PNAomg and PNAdmg by setting parameter K with values 










































TABLE II 

Performance comparison of different methods for partial network alignment with different network alignment rates. 


anchor link sampling rate 77 



Methods 

O.I 

0.2 

0.3 

0.4 

0.5 

0.6 

0.7 

0.8 

0.9 

I.O 


PNAomg 

0.964 

0.966 

0.973 

0.967 

0.987 

0.989 

0.981 

0.988 

0.989 

0.990 


PNAdmg 

0.960 

0.974 

0.961 

0.976 

0.983 

0.975 

0.982 

0.989 

0.986 

0.990 


PNAom 

0.942 

0.938 

0.948 

0.945 

0.954 

0.960 

0.970 

0.968 

0.983 

0.981 

ACC 

PNAdm 

0.940 

0.951 

0.949 

0.929 

0.949 

0.947 

0.969 

0.966 

0.983 

0.981 


Mna 

0.917 

0.918 

0.922 

0.922 

0.931 

0.937 

0.940 

0.943 

0.949 

0.971 


PNAo 

0.905 

0.907 

0.915 

0.915 

0.918 

0.927 

0.926 

0.925 

0.929 

0.921 


PNAd 

0.905 

0.908 

0.911 

0.912 

0.915 

0.926 

0.923 

0.925 

0.929 

0.923 


MNA_no 

0.895 

0.899 

0.901 

0.907 

0.916 

0.921 

0.922 

0.924 

0.919 

0.922 


PNAomg 

0.280 

0.375 

0.442 

0.496 

0.615 

0.717 

0.776 

0.843 

0.941 

0.965 


PNAdmg 

0.283 

0.374 

0.412 

0.481 

0.589 

0.658 

0.783 

0.848 

0.925 

0.972 


PNAom 

0.230 

0.318 

0.384 

0.452 

0.543 

0.638 

0.723 

0.824 

0.916 

0.963 

FI 

PNAdm 

0.239 

0.324 

0.369 

0.424 

0.526 

0.593 

0.716 

0.812 

0.919 

0.963 


Mna 

0.211 

0.267 

0.375 

0.420 

0.496 

0.578 

0.705 

0.782 

0.899 

0.943 


PNAo 

0.014 

0.054 

0.211 

0.210 

0.305 

0.402 

0.413 

0.385 

0.428 

0.438 


PNAd 

0.010 

0.048 

0.I3I 

0.165 

0.257 

0.380 

0.365 

0.367 

0.405 

0.438 


MNA_no 

0.004 

0.021 

0.042 

0.067 

0.232 

0.322 

0.339 

0.346 

0.360 

0.380 


TABLE III 

Performance comparison of different methods for partial network alignment with different negative positive rates. 


negative positive rate 6 


Measure 

Methods 

I 

2 

3 

4 

5 

10 

20 

30 

40 

50 


PNAomg 

0.941 

0.900 

0.903 

0.904 

0.905 

0.989 

0.995 

0.995 

0.998 

0.997 


PNAdmg 

0.920 

0.917 

0.903 

0.913 

0.893 

0.975 

0.994 

0.998 

0.997 

0.997 


PNAom 

0.934 

0.898 

0.899 

0.882 

0.898 

0.960 

0.975 

0.981 

0.992 

0.995 

Acc 

PNAdm 

0.916 

0.914 

0.892 

0.910 

0.887 

0.947 

0.977 

0.981 

0.990 

0.990 


Mna 

0.914 

0.863 

0.884 

0.886 

0.878 

0.937 

0.966 

0.970 

0.978 

0.986 


PNAo 

0.706 

0.795 

0.834 

0.849 

0.880 

0.927 

0.958 

0.970 

0.976 

0.980 


PNAd 

0.752 

0.812 

0.836 

0.865 

0.875 

0.926 

0.955 

0.968 

0.976 

0.980 


MNA_no 

0.714 

0.781 

0.825 

0.839 

0.873 

0.921 

0.953 

0.968 

0.975 

0.980 


PNAomg 

0.943 

0.870 

0.835 

0.805 

0.776 

0.717 

0.608 

0.552 

0.565 

0.524 


PNAdmg 

0.926 

0.890 

0.834 

0.821 

0.754 

0.658 

0.602 

0.577 

0.548 

0.533 


PNAom 

0.936 

0.867 

0.832 

0.772 

0.769 

0.638 

0.550 

0.470 

0.438 

0.366 

FI 

PNAdm 

0.923 

0.887 

0.822 

0.819 

0.747 

0.593 

0.563 

0.468 

0.419 

0.405 


Mna 

0.887 

0.800 

0.790 

0.760 

0.694 

0.578 

0.508 

0.397 

0.346 

0.329 


PNAo 

0.600 

0.609 

0.553 

0.515 

0.492 

0.402 

0.294 

0.251 

0.I3I 

0.051 


PNAd 

0.687 

0.633 

0.569 

0.528 

0.455 

0.380 

0.230 

0.I3I 

0.093 

0.067 


MNA_no 

0.575 

0.542 

0.526 

0.483 

0.447 

0.322 

0.204 

0.105 

0.075 

0.041 


in {1,2,3,4,5,10,20,30,40,50}. 


In Figures 6 (a)| 6 (b) parameters 0 and r] are fixed as 5 
and 0.4 respectively. From the results, we observe that both 
PNAomg and PNAdmg can perform very well when K is 
small and the best is obtained at Ff = 1. It shows that the 
anchor link candidates with the highest confidence predicted 
by PNAo and PNAd are the optimal network alignment 
results when 0 and 77 are low. In Figures 6 (c)| 6 (d) we set 
T] as 0.9 and 0 as 50 (i.e., the networks contain more anchor 
links and the training/test sets become more imbalance), we 
find that the performance of both PNAomg and PNAdmg 
increases first and then decreases and finally stay stable as K 
increases, which shows that the optimal anchor link candidates 
are those within the top K candidate set rather than the one 


with the highest confidence as the training/test sets become 
more imbalance. 

In addition, the partial matching strategy can shrink the 
preference lists of users a lot, which can lead to lower time 
cost as shown in Figure [7] especially for the smaller K values 
which lead to better accuracy as shown in Figure 

Results in all these figures show that generic stable match¬ 
ing can effectively prune the redundant candidate links and 
significantly improve the prediction results. 

VI. Related Works 

Aligned social network studies have become a hot research 
topic in recent years. Kong et al. ifT^ are the first to propose 
the anchor link prediction problem in fully aligned social 
networks. Zhang et al. ll36l . 13711 . 1451 . l40l propose to 























Fig. 7. Time cost of PNAomg and PNAdmg with different partial matching 
rates. 

predict links for new users and new networks by transferring 
heterogeneous information across aligned social networks. A 
comprehensive survey about link prediction problems across 
multiple social networks is available in (381. In addition to 
link prediction problems, Jin and Zhang et al. ifT^ . (391, iBdl 
introduce the community detection problems across aligned 
networks and Zhan et al. (SSj study the information diffusion 
across aligned social networks. 

Meta path first proposed by Sun et al. flSh has become a 
powerful tool, which can be applied in either in link prediction 
problems (26l, (Z7l or clustering problems (28ll . tlSj . Sun et al. 
(26l propose to predict co-author relationship in heterogeneous 
bibliographic networks based on meta path. Sun et al. extend 
the link prediction model to relationship prediction model 
based on meta path in (ZTl . Sun et al. (^ propose to calculate 
the similarity scores among users based on meta path in 
bibliographical network. Sun et al. (25\ also apply meta path 
in clustering problem of heterogeneous information networks 
with incomplete attributes. 

Tensor has been widely used in social networks studies. 
Moghaddam et al. ED propose to apply extended tensor 
factorization model for personalized prediction of review 
helpfulness. Liu et al. (Tvll present a tensor-based framework 
for integrating heterogeneous multi-view data in the context 
of spectral clustering. A more detailed tutorial about tensor 
decomposition and applications is available in (B). 

Class imbalance problems in classification can be very 
common in real-world applications. Chawla et al. (H propose a 
technique for over-sampling the minority class with generated 
new synthetic minority instances. Kubat et al. CD propose to 
address the class imbalance problems with under sampling of 
the majority cases in the training set. A systematic study of 
the class imbalance problem is available in (TT]| . 

College admission problem (^ and stable marriage prob¬ 
lem m have been studied for many years and lots of works 
have been done in the last century. In recent years, some new 
papers have come out in these areas. Sotomayor et al. (^ 
propose to analyze the stability of the equilibrium outcomes 
in the admission games induced by stable matching rules. 
Ma ca analyzes the truncation in stable matching and the 


small core in nash equilibrium in college admission problems. 
Floreen et al. ii propose to study the almost stable matching 
by truncating the Gale-Shapley algorithm. 

VII. Conclusion 

In this paper, we study the partial network alignment 
problem across partially aligned social networks. To address 
the challenges of the studied problem, a new method PNA is 
proposed in this paper. PNA can extract features for anchor 
links based on a set of anchor meta paths and overcome 
the class imbalance problem with over sampling and down 
sampling. PNA can effectively prune the non-existing anchor 
links with generic stable matching to ensure the results can 
meet the one — to — one< constraint. Extensive experiments 
done on two real-world partially aligned networks show 
the superior performance of PNA in addressing the partial 
network alignment problem. 
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